Evaluation method for ecology-agriculture-urban spaces based on deep learning

With the increasing global population and escalating ecological and farmland degradation, challenges to the environment and livelihoods have become prominent. Coordinating urban development, food security, and ecological conservation is crucial for fostering sustainable development. This study focuses on assessing the "Ecology-Agriculture-Urban" (E-A-U) space in Yulin City, China, as a representative case. Following the framework proposed by Chinese named "environmental capacity and national space development suitability evaluation" (hereinafter referred to as "Double Evaluation"), we developed a Self-Attention Residual Neural Network (SARes-NET) model to assess the E-U-A space. Spatially, the northwest region is dominated by agriculture, while the southeast is characterized by urban and ecological areas, aligning with regional development patterns. Comparative validations with five other models, including Logistic Regression (LR), Naive Bayes (NB), Gradient Boosting Decision Trees (GBDT), Random Forest (RF) and Artificial Neural Network (ANN), reveal that the SARes-NET model exhibits superior simulation performance, highlighting it’s ability to capture intricate non-linear relationships and reduce human errors in data processing. This study establishes deep learning-guided E-A-U spatial evaluation as an innovative approach for national spatial planning, holding broader implications for national-level territorial assessments.


Study area
Yulin City (Fig. 1), situated in the northernmost region of Shaanxi Province, China, is an inland city characterized by an expansive land area of 42,920.2square kilometers.The terrain within this city is notable for its elevation variations, stretching from highlands in the northwest to lower areas in the southeast.Yulin City experiences a temperate arid and semi-arid continental monsoon climate, featuring four distinct seasons.This region comprises a northern sandy grass beach area and a southern loess hilly and gully region.In recent years, Yulin City has witnessed remarkable improvements in its ecological environment, concomitant with rapid economic growth.However, this progress has come with the challenge of balancing economic and social development with ecological protection, given the city's location at the interface between the Loess Plateau and the Mu Us Sandy Land.It falls within the category of a typical agro-pastoral ecotone, characterized by heightened ecological sensitivity and a relatively fragile environmental ecosystem.Addressing the long-term challenge of harmonizing economic and social development with the preservation of the ecological environment and guiding the layout of urban spaces will be a central concern for Yulin City.Serving as a vital energy and chemical hub in China, a key ecological

Feature engineering
Following the principles of scientific rigor, representation, and accessibility in feature selection, we delineate essential feature factors in Table 2 (Supplementary Fig. S1 online).Drawing insights from prior research and accounting for the unique ecological dynamics of the Yulin region, we judiciously curated a comprehensive set of 22 factors.These include pivotal indicators such as soil erosion, NDVI, and NPP.For ecological space, we meticulously selected land resources and natural resources, constituting assessment features 3,16,25 .In the domain of agricultural space, our focus extended to land resources, water resources, climate conditions, and environmental variables-comprising agricultural space evaluation features 17,27 .Delving into urban space, bound by defined development parameters, we incorporated fundamental geographical data (terrain and slope) and integrated socio-economic metrics gleaned from Point of Interest (POI) datasets, alongside climate data form the assessment features 19,20 .We designed a deep neural network model structure based on the characteristics of the data, as depicted in Fig. 2. We will select 22 features, including basic geographic data such as terrain, slope, and aspect, land resource data like soil erosion, soil texture, and land use, natural resource data like NPP, NDVI, and rainfall, locational data such as proximity to roads and rivers, socio-economic data including population density, nighttime lights, and various commercial and service aggregation points, as well as meteorological data like annual average temperature and humidity (Supplementary Fig. S1).These features form a feature vector as input to the network.The network initially elevates the dimension of features to 100 through a fully connected layer.Subsequently, it undergoes two self-attention residual modules to perform nonlinear computations, obtaining a feature vector with higher semantic information.The vector's dimension is then reduced to 50 through a series of fully connected layers and another self-attention residual module, densifying the features.The resulting feature vector, with a dimension of 50, is fed through a fully connected layer to calculate scores for three land-use categories.www.nature.com/scientificreports/Finally, a softmax function transforms these scores into predicted probabilities for ecological, agricultural, and urban land-use categories.The probabilities of the three categories sum up to 1.

Experimental design
This study integrates diverse geographical spatial data for the evaluation of ecological, agricultural, and urban space suitability (see Supplementary Fig. S1 online).Initially, evaluation indicators were selected, and subsequent data preprocessing facilitated the construction of a comprehensive sample set.To assess the E-A-U space, a comparative validation was performed using an array of models, including ANN, GBDT, LR, SARes-NET, NB, and RF algorithms.This comparison aimed to identify the most optimal model.The selected superior model, as determined through the validation process, was then employed for the in-depth evaluation of the E-A-U space, as illustrated in Fig. 3.

Model evaluation meric
To assess model performance comprehensively, this study employs a combination of confusion matrix and ROC curve with AUC area to evaluate the classification performance of the model 44 (see Supplementary Method).
Based on the confusion matrix, we calculated the model's accuracy (ACC), precision (PRE), recall (REC), and Kappa coefficient separately for performance comparison 45 .The specific meanings of these indicators are detailed in the Supplementary Method.

Implementation details
The model was trained on a 64-bit Windows 10 operating system using the Python programming language, the Pytorch 1.12.0 deep learning framework, and CUDA 11.  www.nature.com/scientificreports/

Mutual information method
Mutual information (MI), rooted in the realm of information theory, was initially introduced by Claude Shannon in his groundbreaking paper, ' A Mathematical Theory of Communication, ' in 1948 46 even though he did not explicitly refer to it as 'mutual information.' The term 'mutual information' was later coined by Robert Fano 47 MI serves as a measure designed to quantify the level of information exchange between two random variables.Mathematically, MI computes the Kullback-Leibler divergence between the product of the joint probability distribution of two random variables and the marginal probability distribution of these two variables.In practical applications, MI has found extensive use across various domains, particularly in machine learning and data mining.It is frequently employed as a feature selection method to assess feature importance and select the most relevant features by measuring the mutual information between features and target variables 18,48 The calculation formula is as follows: In the formula, ' p(x i , y) ' represents the joint probability distribution of two variables, ' x i ' and 'y' , while ' p(x i ) ' and ' p(y) ' represent the marginal probability distribution of ' x i ' and 'y' , respectively.Here, ' x i ' denotes the i-th input feature, and 'y' represents the label for region division.

Modeling validation results
To assess the accuracy of the deep learning model, we conducted validation on the constructed dataset using our network architecture.We compared the predictive performance of our model with five machine learning methods, namely, LR 32 , NB 22 , GBDT 33 , RF 21,41 , and ANN 42 .This evaluation was performed to validate the feasibility of the model in predicting the suitability of the three spatial zones in Yulin City.
The well-trained model was employed to predict on the validation set, yielding a confusion matrix (Fig. 4).Subsequently, validation set accuracy, Kappa coefficient, precision, and recall were calculated based on Eqs.(1) to ( 4) outlined in Supplementary Method.As depicted in Table 3, both Ann and our proposed method (SARes-NET) exhibited favorable validation results compared to the other four machine learning models, indicating superior performance of the deep learning model over traditional machine learning models.Notably, the Self-Attention Residual Neural Network designed and implemented in this study outperformed Ann, achieving an (1) MI(x i ; y) = www.nature.com/scientificreports/accuracy of 89.86%, a Kappa coefficient of 80.77%, precision rates for the ecological, agricultural, and urban models at 88.90%, 90.99%, and 89.59%, respectively, and recall rates of 91.89%, 87.52%, and 92.65%.These results underscore the high simulation accuracy of the model, affirming its applicability within the context of this study.
To comprehensively assess the model's performance, this study employs the ROC curve and the AUC (Area Under the Curve) metric.The model's fitting ROC curve is depicted in Fig. 5, and its corresponding AUC area is calculated.The evident concave trend towards the upper-left of the ROC curve indicates superior classification performance, with a considerable distance from the ROC curve of a purely random classifier (depicted by the black dashed line in the figure).In comparison to other machine learning models, the SARes-NET deep learning model constructed in this study achieves an AUC value of 0.98, surpassing other models.This result signifies excellent classification performance, reaching an ideal state.

Spatial distribution characteristics of ecology-agriculture-urban
In this study, the output results of the three spatial categories are classified into three levels using a natural break method.Specifically, the evaluation of ecological space is categorized as extremely important, important, and generally important.The evaluation of agricultural space is divided into suitable, moderately suitable, and unsuitable, while the evaluation of urban space is classified as suitable, moderately suitable, and unsuitable.The findings reveal that in terms of spatial distribution, the urban-agricultural-ecological space in Yulin City exhibits the following distribution characteristics.
(1) The extremely important ecological protection area spans 8,978.90square kilometers, representing 20.92% of the city's total land area.The category with the highest proportion is the important ecological protection area, which accounts for 50.39% of the three categories (Fig. 6).In Yulin City, the moderately suitable area  and unsuitable area for agricultural space are relatively similar, covering 40.51% and 58.11% of the territory, respectively(Fig.7).In contrast, the suitable agricultural area is quite small, comprising only 592.30 square kilometers, or 1.38% of the total region.The second level of urban space(Fig.8), characterized as the moderately suitable area, is the most extensive, constituting 40.73% of the urban space.The proportion of suitable and unsuitable areas for urban space is approximately equivalent, both standing at about 29%.(2) The northwestern part of the study area is predominantly suitable for agricultural space, with ecological and urban unsuitable areas, and scattered regions of suitable agricultural space.This area exhibits a more evenly distributed average annual rainfall and is closer to water bodies.Land types here are primarily unused and cultivated land, characterized by relatively flat terrain with lower overall slopes compared to the southeast.In this study, it was observed that the alterations in E-A-U space are primarily influenced by land resources, natural resources, and social resources, with rainfall, annual accumulated temperature, land use type, soil texture, nighttime lighting, and other factors playing significant roles.Considering the specific conditions of the study area, it can be generally concluded that water resources are the most critical factor affecting the distribution of land space in the region.Yulin City exhibits local vulnerabilities and imbalances in the coordinated development of water resources and regions.Therefore, it is advisable to further adjust the water structure and optimize the allocation of water resources to address these disparities.
This study employs deep learning as an approach for suitability assessment, and the constructed deep learning model achieves an accuracy of 89.86% and a Kappa coefficient of 80.77% on the validation set.In comparison to traditional expert ratings or other subjective weighting methods, this enhances the objectivity of the evaluation results, reinforcing the scientific and practical aspects of suitability assessment.Furthermore, a comparative validation is conducted with five other deep learning and machine learning models, including Ann and GBDT 21,32,33,41,50 .The results of the validation indicate that the performance of the deep learning model surpasses that of machine learning models.Particularly, the SARes-NET model designed and implemented in this study exhibits superior simulation performance compared to Ann, demonstrating its excellent applicability in this context.The model adeptly captures complex nonlinear relationships between evaluation factors and suitability levels, reducing human errors in data processing.This contributes to strengthening the practical application of the outcomes of "Land Suitability Assessment" in guiding the scientific allocation of land space elements and exploring suitability assessment methods tailored for urban scales.
Nonetheless, there are limitations in this study, primarily related to data collection.The acquisition of relevant indicators for land spatial suitability evaluation is not exhaustive, particularly the absence of certain data such as soil heavy metal information and atmospheric data.Furthermore, the study faces data limitations when it comes to water resources data, such as groundwater data and total water consumption.Additionally, the study is confined to a spatial analysis based on 2020 data, and lacks an examination of the temporal dimension.In the future, we aim to address these issues and enhance the comprehensiveness and depth of our research by expanding data sources and considering both spatial and temporal dimensions.

Conclusions
In the context of ecological civilization construction, land space planning necessitates the scientific, rational, intensive, and efficient utilization of land resources.Land suitability evaluation should adhere to ecological security, resource efficiency, and sustainability standards.In land spatial planning, suitability evaluation serves as a tool for analyzing the foundational conditions of land space.A scientifically sound suitability evaluation can offer robust support for guiding land space planning.
This study drew inspiration from the big data research paradigm and harnessed multi-source data reflecting socio-economic aspects, including nighttime lights and points of interest (POI), to expand the land space suitability evaluation index system.The Self-Attention Residual Neural Network Diagram (SARes-NET) method was compared with five artificial intelligence methods to validate its accuracy and applicability.This approach effectively resolves issues related to multicollinearity among evaluation factors, offering a novel concept for optimizing the spatial pattern of land at the city and county levels.It also provides fresh insights into clarifying zoning evaluation factors and scientific zoning methods.
The northwestern region of the study area is predominantly characterized by agricultural space, offering opportunities for adjustments in the agricultural structure in alignment with economic development and regional features.In the southeast, urban space and ecological space predominate, suggesting potential for the development of ecological agricultural areas.These research findings enhance our comprehension of territorial spatial planning and offer valuable insights for the sustainable development and ecological preservation of Yulin City.
The outcomes of this research offer a fresh perspective on coordinating urban development, ensuring food security, safeguarding ecological integrity, and achieving sustainability.They hold significance as a reference for www.nature.com/scientificreports/land space planning and management in other regions and contribute to the advancement of territorial space planning practices, ultimately promoting sustainable development.

Figure 1 .
Figure 1.Study area is the City of Yulin, Shaanxi, China: (a) Administrative map of China, (b) Administrative map of Shaanxi Province, (c) Digital elevation map of Yulin City.Note that we use QGIS sofware (https:// qgis.org/ en/ site/, version: 3.34) for plotting.

Figure 2 .
Figure 2. Self-Attention Residual Neural Network Diagram (SARes-NET), Utilizing 22 features, forming a feature vector.The network elevates feature dimension to 100, then undergoes two self-attention residual modules for nonlinear computations, reducing dimension to 50.This densified feature vector is processed by fully connected layers to calculate scores for three land-use categories, transformed into predicted probabilities via softmax function.

3 .
The GPU model used was NVIDIA RTX A4000 (16 GB VRAM), and the CPU model was an 8-core, 16-thread Intel(R) Xeon(R) W-2245 CPU @ 3.90 GHz with a total memory size of 64 GB.

Figure 3 .
Figure 3. Proposed workflow of our work: We compared SARes-NET with five other models to select the optimal one for spatial evaluation of the three spaces.

Figure 4 .
Figure 4. Model Validation Confusion Matrix: (a) Artificial Neural Network (ANN); (b) Gradient Boosting Decision Trees (GBDT); (c) Logistic Regression (LR); (d) Self-Attention Residual Neural Network Diagram (SARes-NET); (e) Naive Bayes (NB); (f) Random Forest (RF).Note that we use '0' as the label for Ecology; '1' as the Agriculture label; '2' as the label for Urban.Each row of the matrix corresponds to a predicted lable, including ecological, agricultural, and urban spaces.Each column of the matrix corresponds to an true lable.The values on the diagonal represent the number of consistent predictions with the ground truth for each category.Compared to other models, our model has the highest number of correct predictions, indicating superior performance of the matrix.

Figure 5 .
Figure 5. ROC curve Graph of various algorithm models for 'E-U-A' space (ANN, Our method: SARes-NET, GBDT, RF, LR ,NB).The ROC curve closer to the upper left corner indicates a larger area under the curve and better performance of the model.

Figure 6 .( 4 )
Figure 6.The City of Yulin with the mapped Distribution of Ecological Importance Level of three categories: extremely important area, important area and generally important area, which are drawn by QGIS software (https:// qgis.org/ en/ site/, version: 3.34).

Figure 8 .
Figure 8.The City of Yulin with the mapped Distribution of Urban suitability grade of three categories: suitable area, moderately suitable area and unsuitable area, which are drawn by QGIS software (https:// qgis.org/ en/ site/, version: 3.34).

Figure 9 .
Figure 9.The City of Yulin with the mapped Distribution of National territorial Spatial Pattern of six grade: Ecological conservation extremely important area, Ecological conservation important area, Agricultural suitable area, Agricultural moderately suitable area, Urban suitable area and Urban moderately suitable area, which are drawn by QGIS software (https:// qgis.org/ en/ site/, version: 3.34).

Figure 10 .
Figure 10.Ranking Chart of driving factor importance of E-U-A Space.Note that AP denotes Average precipitation, LUT denotes Land use types, AAAT denotes Average annual accumulated temperature, NL denotes Night lighting, DOHR denotes Distribution of healthcare regions, DOEI denotes Distribution of educational institutions, PD denotes Population density, SS denotes Scenic spots, WI denotes wetness index, DOCE denotes Distribution of catering establishment, SE denotes Soil erosion, DTO denotes Distance to roads, SA denotes Slope aspect, DOHGHA denotes Distribution of high-risk Geological hazards area, DTR denotes Distance to river.

Table 1 .
Data Type and Sources comprising basic geographic, land resource, natural resource, location conditions, socio-economic, and meteorological data.

Table 2 .
5,42,43cal, Agricultural, Urban space evaluation index and feature engineering selection.Logistic regression A linear model employed for solving classification problems.It models the probability of an event occurrence by taking the logarithm of the odds (log-odds), which is a linear combination of one or more independent variables.It performs well in exploring linear relationships32.(2)NaiveBayesA simple and effective classification algorithm based on Bayes' theorem.It leverages the assumption of independence between features to assign an instance to the category with the highest prob-Artificial Neural Network (Ann) A model designed based on neural network architecture, capable of complex nonlinear modeling through multiple layers of neurons.Suitable for tasks such as handling large-scale data, image processing, and natural language processing, it exhibits powerful fitting capabilities5,42,43. (6) Self-Attention Residual Neural Network Diagram (SARes-NET) Vol.:(0123456789) Scientific Reports | (2024) 14:11353 | https://doi.org/10.1038/s41598-024-61919-1 1) Aggregation of multisource spatial data: Uniformly aggregate the data to a spatial resolution of 90 m × 90 m grids, constructing a feature matrix.The entire study area is subdivided into 5,298,221 units.(2) E-A-U Spatial Sample Division: The ecological protection red line, nature reserves, and drinking water sources are vital ecological areas designated by the Chinese government.In this study, these areas serve as ecological space samples (1,052,712).Agricultural space samples (983,306) are selected within permanent basic farmland, while urban space samples (60,961) focus on the existing built-up areas of cities, acknowledging their limited potential for change.(3) Division of Dataset Based on Stratified Sampling: To ensure consistency between the test and training samples, a 7:3 stratified sampling is employed to divide the samples into training and testing sets.The training set is used for model training, while the testing set is employed for model validation and performance evaluation.